leaderboard: per-dataset metric derivation, chip filters + metric toggle on every table#29
Merged
Merged
Conversation
…ric toggle on every table - per-dataset shard reads metrics from actual runs (no MAP/recall_1000 phantom columns) - shared FilterChips + MatrixCell components reused across home / dataset / method / model / retriever pages - every per-X table gets chip filters (method/model/retriever/metric as applicable) + metric toggle - pretty metric labels (nDCG@10, R@1k, R@100, MAP) everywhere - drop double scrollbar on home + per-dataset tables - /models index renders display label, not provider-prefixed id - /runs page shows method display name; reproduce snippet aligned to example pipeline with correct Pyserini index names and qrels-based trec_eval - /about page no longer claims run.txt/queries.tsv are guaranteed; path includes retriever segment Co-Authored-By: Claude Opus 4.7 <[email protected]>
…ed filter card - Wrap every table in a fixed-height card with a styled 8px thin scrollbar so the page chrome stays in view while rows scroll - Sticky thead inside the scroll container; sticky leftmost axis columns (Method/Model/Retriever, varies per page) with CSS-var-driven widths and a mobile fallback - Inline sort arrows on stacked dataset/metric column headers via a slot the table wires into - Filter chips moved into a dedicated card; metric toggle now also re-fires the current sort so row order matches the visible metric - MatrixCell always renders both metric spans (em-dash for missing) and uses the new .qg-cell-best highlight (accent + dark-mode glow) - Decimal precision unified at 4 across MatrixCell, side-by-side dataset cells, and the run-detail metrics table - /datasets/[id] renders both metrics side by side instead of a single-column toggle - /datasets/ index drops the stale eval_metrics badge - /runs/[run_id] reproduce snippet simplifies the qrels lookup - Stat cards gain hover:border-qg-accent; InteractiveTable search input restyled with magnifier icon; MetricCell removed (dead code) Co-Authored-By: Claude Opus 4.7 <[email protected]>
Member
Author
|
Pushed a follow-up commit (
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive table-quality pass across every leaderboard page, driven by issues found in the latest review:
eval_metricsthe dataset registry listed (MAP on TREC DL, recall_1000 on BEIR) regardless of whether those metrics actually appeared in the data. The per-dataset shard now derives its columns from the actual run rows, same approach as the home matrix uses.qg-chip-hidden→qg-itable-reapplyhandshake).ndcg_cut_10→nDCG@10,recall_1000→R@1k,recall_100→R@100,map→MAP) on /datasets/[id] and the per-X pages too.max-h-[70vh] overflow-y-auto. The page scrolls naturally;sticky top-0thead sticks to the viewport.gpt-4.1) not the provider-prefixed id (openai/gpt-4.1) — matches the /methods index convention..flat.splade-pp-ed/.flat.bge-base-en-v1.5for non-lexical paradigms; trec_eval references the qrels key from the dataset registry, not the topics key.Q2D (FS)etc.) not the raw method_id..run.txtand queries.tsv — those are optional under the current schema; path includes the{retriever}segment that PR Schema: optional artifacts + DL-HARD dataset entry #20 added.MatrixCell.astro(link + primary/secondary spans + sort hooks) andFilterChips.astro(groups + metric special-case + reapply event).Test plan
python -m pytest reproducibility/tests/— 44/44 passingpnpm --filter @qg/leaderboard build— clean (1113 pages built)beir-v1.0.0-trec-covid.splade-pp-ed, not.flat.splade-pp-edmax-h-[70vh]wrapper🤖 Generated with Claude Code